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Abstract 

Our article is concerned with adaptive sampling schemes for Bayesian inference that 
update the proposal densities using previous iterates. We introduce a copula based 
proposal density which is made more efficient by combining it with antithetic variable 
sampling. We compare the copula based proposal to an adaptive proposal density based 
on a multivariate mixture of normals and an adaptive random walk Metropolis proposal. 
We also introduce a refinement of the random walk proposal which performs better for 
multimodal target distributions. We compare the sampling schemes using challenging 
but realistic models and priors applied to real data examples. The results show that 
for the examples studied, the adaptive independent Metropolis-Hastings proposals are 
much more efficient than the adaptive random walk proposals and that in general the 
copula based proposal has the best acceptance rates and lowest inefficiencies. 
Keywords: Antithetic variables; Clustering; Metropolis-Hastings; Mixture of normals; 
Random effects. 



1 Introduction 



Bayesian inference using Markov chain Monte Carlo simulation methods is used extensively 
in statistical applications. In this approach, the parameters are generated from a proposal 
distribution, or several such proposal distributions, with the gener ated proposals accepted 



TiernevI (|l994l ). 



or rejected using the Metropolis-Hastings method; see for example 

In adaptive sampling the parameters of the proposal distribution are tuned by using pre- 
vious draws. Our article deals with diminishing adaptation schemes, which means that the 
difference between successive proposals converges to zero. In practice, this usually means 
that the proposals themselves eventually do not change. Impor tant theoretica l and practical 



contri b utions to dimini s hing adaptation samp l ing w e re made by 



2001 



Gasemvil (120031) 



Andrieu and Moulines 



Andrieu and RobertI (I200lh. 



(120061 1 



Holden 



1998) 



Haario et al 



Andrieu, Moulines. and Doucet 



Roberts and Rosenthal! (|2007l ) and 



(12005 ). 



Roberts and Rosenthal 



(j2006l ) . The adaptive randor n walk Metropolis method was 



with f urthe r contributions by 



(|2005l ) and 



Atchade and Rosenthal (2005). 



Roberts and Rosenthall (|2006l ) 



p roposed by 



Haario et al 



2001 



Andrieu, Moulines. and Doucet 



Giordani and KohnI (j2008l ) propose an adaptive 



independent Metropolis-Hastings method with a mixture of normals proposal which is es- 
timated using a clustering algorithm. 

Although there is now a body of theory justifying the use of adaptive sampling, the 
construction of interesting adaptive samplers and their empirical performance on real ex- 
amples has received less attention. Our article aims to fill this gap by introducing a t-copula 
based proposal density. An antithetic version of this proposal is also studied and is shown 
to increase efficiency when the a cceptance rate is above 70%. We also refine the adaptive 



Metropolis-Hastings proposal in 



Roberts and Rosenthall (j2006l ) by adding a heavy tailed 



component to allow the sampling scheme to traverse multiple modes more easily. As well 
as being of interest in its own right, in some of the examples we have also used this refined 
sampler to initialize the adaptive independent Metropolis-Hastings schemes. We study the 
performanc e of the above adap t ive pr oposals, as well as the adaptive mixture of normals 



proposal of 



Giordani and Kohn 



(|2008l ). for a number of models and priors using real data. 



Th e models and prior s produce challenging but realistic posterior target distributions. 



Silva et al. 



(|2008l ) is a longer version of our article that considers some alternative 



versions of our algorithms and includes more details and examples. 

2 Adaptive sampling algorithms 

Suppose that 7r(x) is the target density from which we wish to generate a sample of obser- 
vations, but that it is computationally difficult to do so directly. One way of generating the 
sample is to use the Metropolis-Hastings method, which is now described. Suppose that 
given some initial xq we have generated the j — 1 iterates xi, . . . , Xj-i. We generate Xj from 
the proposal density qj{x; z) which may also depend on some other value of x which we call 
z. Let be the proposed value of Xj generated from qj{xj-i] x). Then we take Xj = x^ 
with probability 



a(Xj^i;xn = mm<^ 1, — — ' 



(1) 



and take Xj = xj^i otherwise. If qj{z;x) does not depend on j, then under appropriate 
regularity conditions we can show th a t the sequence of iterates Xj converges to draws from 



the target density 7r(x). See 



Tierneyl (119941 ) for details. 
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In adaptive sampling the parameters of qj{z; x) are estimated from the iterates xi, . . . , Xj-2- 
Under appropriate regularity conditions t he sequence of iterates Xj,. 7 > 1, converges to 



draws from the target distribution tt(x ). See 



pood ) and 



Giordani and Kohn 



(12003). 



Roberts and Rosenthall pOOTI ) 



Roberts and Rosenthal 



We now describe the adaptive sampling schemes studied in the paper. 



2.1 Adaptive random walk Metropolis 



The adaptive random walk Metropolis proposal of iRoberts and Rosenthall (|2006l ) is 



i{z; X) = U}in<pdix; Z, KiSi) + UJ2n(l>dix; Z, fi:25^2n) 



(2) 



where d is the dimension of x and (j)d{x; z, S) is a multivariate d dimensional normal den- 
sity in X with mean z and covariance matrix S. In ([2]), = 1 for n < uq, with no 
representing the initial iterations, uin = 0.05 for n > uq with u)2n = 1 — ki = 
O.l^/d, K2 = 2.38^ /(i, Si is a con s tant c ovariance matrix, which is taken as the identity 



matrix by [Roberts and Rosenthall (|2006l ) but can be based on the Laplace approximation 
or some other estimate. The matrix is the sample covariance matrix of the first n — 1 
iterates. The scalar ki is meant to achieve a high acc eptance rate by mov ing the sampler lo- 



19971 ) for a random walk 



cally, while the scalar K2 is considered to be optimal (jRoberts et al. 
proposal when the target is multivariate normal. We note that the acceptance probability 
(HI) for the adaptive random walk Metropolis simplifies to 



a(x, z) = min ( 1 



7r(£) 
ir{x) 



(3) 
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We refine the two component random walk Metropolis proposal in ([2]) by adding a third 
component with = Ti2n and with K3 = 25 ^ ki,K2- We take cj3„ = if n < ng, 
"^Sra = 0.05 for n > no and u}2n = 1 — ^^in — ^^Sn- Alternatively, the third component can be 
a multivariate t distribution with small degrees of freedom. We refer to this proposal as the 
three component adaptive random walk. The purpose of the heavier tailed third component 
is to allow the sampler to explore the state space more effectively by making it easier to 
leave local modes. 

To illustrate this issue we consider the performance of the two and three component 
adaptive random walk samplers when the target distribution is a two component and five 
dimensional multivariate mixture of normals. Each component in the target has equal prob- 
ability, the first component has mean vector /ii = (—3, . . . , —3)^ and the second component 
has mean vector fi2 = (3, . . . , 3)"^". Both components have identity covariance matrices. 
For the three component adaptive random walk we choose K3 = 4^. The starting value is 
xq = Hi for both adaptive random walk samplers. 

Figured] compares the results and shows that the two component adaptive random walk 
fails to explore the posterior distribution even after 500, 000 iterations, whereas the three 
component adaptive random walk can get out of the local modes. 



2.2 A mixture of normals based proposal density 

r 



he p roposal density of the adaptive independent Metropolis-Hastings approach of iGiordani and Kohn 



(|2008l ) is a mixture with four terms of the form 



4 4 
Qn{x) = y^^UJjngj{x\Xjn) , LOjn > 0, for j = 1, . . . ,4 and ^OJjn = 1 , (4) 

j=l j=l 
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Iterations 

(a) RWMiwrong marginal 
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(b) RWMSCxorrect marginal 
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(d) RWM3C: correct marginal 



Figure 1: Plot of the iterates and histograms of a sample from a marginal distribution of a 
bimodal 5-dimensional mixture of normals distribution using the two component adaptive 
random walk Metropolis (left panels), and the three component adaptive random walk 
Metropolis (right panels). 

with Xjn the parameter vector for the density gjn{x', Xjn)- The sampling scheme is run in 
two stages, which are described below. Throughout each stage, the parameters in the first 
two terms are kept fixed. The first term gi{x\\in) is an estimate of the target density and 
the second term g2{x\X2n) is a heavy tailed version of gi{x\Xin)- The third term g3{x\X3n) 
is an estimate of the target that is updated or adapted as the simulation progresses and 
the fourth term 5'4(a;|A4„) is a heavy tailed version of the third term. In the first stage 
ginix',Xin) is a Laplace approximation to the posterior if it is readily available and works 
well; otherwise, gin{x; Ai„) is a Gaussian density constructed from a preliminary run of 1000 
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iterates or more of the three component adaptive random walk. Throughout, g2ix\X2n) has 
the same component means and probabihties as 5i(x|Ai„), but its component covariance 
matrices are ten times those of gi{x\Xin)- The term gsixlX^n) is a mixture of normals and 
g4{x\X^n) is also a mixture of normals obtained by taking its component probabilities and 
means equal to those of 53(x|A3„), and its component covariance matrices equal to 20 times 
those of 5'3(x|A3n). The first stage begins by using gi{x\Xin) and g2ix\X2n) only with, for 
example, coin = 0.8 and LJ2n = 0.2, until there is a sufficiently large number of iterates to 
form (73(x|A3„). After that we set ujin = 0.15, (jJ2n = 0-05, u^n = 0.7 and ujin = 0.1. We 
begin with a single normal density for 53(x|A3„) and as the simulation progresses we add 
more components up to a maximum of four according to a schedule that depends on the 
ratio of the number of accepted draws to the dimension of x. See Appendix El 

In the second stage, 5i(x|Ai„) is set to the value of (73(x|A3„) at the end of the first stage 
and g2{x\X2n) and 54(x|A4„) are constructed as described above. The heavy-tail ed densities 



g2(x\X 2n) and g4(x|A4„) are included as a defensive strategy, as suggested bv iHesterberg 



19951 ) , to get out of local modes and to explore the sample space of the target distribution 
more effectively. 

It is too computationally expensive to update gsixlX^n) (and hence g4{x\X4n)) at every 
iteration so we update them according to a schedule that depends on the problem and the 
size of the parameter vector. See Appendix lAl 



Our article estimates t 
nent using the method of 



re multivariate mixture of normals density for the third compo- 



Giordani and KohnI (j2008l ) who identify the marginals that are 



not symmetric and estimate their joint density by a mixture of normals using k-harmonic 
means clustering. We also estimated the third component using stochastic approximation 



7 



but without first identifying tfie marginals tliat are not symmetric. We studied the per- 
formance of both approaches to fitting a mixture of normals and found that the clustering 
based approach was more robust in the sense that it does not require tuning for particu- 
lar data sets to perform well, whereas for the more challenging target distributions it was 
necessary to tune the parameters of the stochastic approximation to ob tain optimal result s 



The results for the stochastic approximation approach are reported in iSilva et al. 



(12003) 



We note that est imating a mixture of normals using th e EM algorithm als o does not re 



quire tuning; see IMcLachlan and Peell (|2000l ) , as well as iTitteringtonI (| 19841 ) for the online 



EM. However, the EM algorithm is more sensitive to starting values and is more prone to 
converge to degenerate solutions, particul arly in an MCMC context where small clusters of 



identical observations arise naturally; see 



Giordani and KohnI (|2008l ) for a discussion. 



2.3 A mixture of a t copula and a multivariate t proposal distribution 

The third proposal distribution is a mixture of a i copula and a multivariate t distribution. 
We use the t copula as the major component of the mixture because it provides a flexible 
and fast method for estimating a multivariate density. In our applications this means that 
we assume that after appropriately transforming each of the parameters, the joint posterior 
density is multivariate t. Or more accurately, such a proposal density provides a more 
accurate estimate of the posterior density than using a multivariate normal or multivariate 
t proposal. The second component in the mixture is a multivariate t distribution with 
low degrees of freedom whose purpose is to help the sampler move out of local modes and 
explore the parameter space more effectively. 
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Jod (|1997l ) andlNelseij (120061') provide a n introduction to copulas, with details of the t 



copula given in lDemarta and McNeill (|2005l ). 

Let t(i^y{z, I /X, S) be a d-dimensional t density with location /x, scale matrix S and 
degrees of freedom v and let Tci^i,{'z\^i^'S) be the corresponding cumulative distribution 
function. Let fj{xj\6j) and Fj{xj\6j) be the probability density function and the cumulative 
distribution function of the jth marginal, with Oj its parameter vector. Then the density 
of the t copula based distribution for x is 



n,=i*i,i^(^i;o>i) ,=1 



(5) 



where Zj is determined by Ti^,^{zj;0,l) = Fj{xj\9j), j = l,...,d. For a given sample of 
observations, or in our case a sequence of draws, we fit the copula by first estimating each 
of the marginals as a mixture of normals. For the current degrees of freedom, fc, of the t 
copula, we now transform each observation x = (xi, . . . , Xd) to z = (zi, . . . , z^) using 



Ti,uc{zj;0,l) = Fj{xj\Oj) for j = l,...,d. 



(6) 



This produces a sample of d dimensional observations z which we use to estimate the scale 
matrix as the sample correlation matrix of the z's. Given the estimates of the marginal 
distributions and the scale matrix S^, we estimate the degrees of freedom v by maximizing 
the profile likelihood function given by ([5]) over a grid of values u = {3, 5, 10, 1000}, with 
u = 1000 representing the Gaussian copula. We could use instead the grid of v values in 
([6]), but this is more expensive computationally and we have found our current procedure 
works well in practice. 



The second component of the mixture is a multivariate t distribution with its degrees of 
freedom Vf = f> fixed and small, and with its location and scale parameters estimated from 
the X iterates using the first and second sample moments. The t copula component of the 
mixture has a weight of 0.7 and the multivariate t component a weight of 0.3. 

To draw an observation from the t copula, we first draw z from the multivariate t 
distribution with location and scale matrix 5]^ and degrees of freedom v. We then use a 
Newton- Raphson root finding routine to obtain Xj for a given Zj from ([6]) for j = 1, . . . , d. 
The details of the computation for the t copula and the schedule for updating the proposals 
is given in Appendix lAl 



I 

Usin g antithe t ic var iables in simulations often increases sampling efficiency; e.g. lRubinstein 



(|l98lh . 



TiernevI (jl994l ) proposes using antithetic variables in Markov chain Monte Carlo 
simulation when the target is symmetric. We apply antithetic variables to the copula based 
approach which generalizes Tierney's suggestion by allowing for nonsymmetric marginals. 
To the best of our knowledge, this has not been done before. The antithetic approach is 
implemented as follows. As above, we determine probabilistically whether to sample from 
the copula component or the multivariate t component. If the copula component is chosen, 
then X is generated as above and x = — x is also computed. The values x and x are then 
transformed to z and z respectively and are accepted or rejected one at a time using the 
Metropolis-Hastings method. If we decide to sample from the multivariate t component to 
obtain z, we also compute z = 2/x — z, where ^ is the mean of the multivariate t, and accept 
or reject each of these values one at a time using the Metro polis-Hastings metho d. 



We note that to satisfy the conditions for convergence in 



Giordani and KohnI (|2008l ) we 



would run the sampling scheme in two stages, with the first stage as above. In the second 
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stage we would have a three component proposal, with the first two components the same 
as above. The third component would be fixed throughout the second stage and would be 
the second component density at the end of the first stage. The third component would 
have a small probability, e.g. 0.05. However, in our examples we have found it unnecessary 
to include such a third component as we achieve good performance without it. 

3 Algorithm comparison 

This section studies the five algorithms discussed in Section [2j The two component adaptive 
random walk metropolis (RWM) and the three component adaptive random walk Metropolis 
(RWM3C) are described in Section [2Tl The adaptive independent Metropolis-Hastings with 
a mixture of normals proposal distribution fitted by clustering (IMH-MN-CL) described in 
Section 12.21 The adaptive independent Metropolis-Hastings which is a mixture of a t copula 
and a multivariate t proposal, with the marginal distributions of the copula estimated by 
mixtures of normals that are fitted by clustering (IMH-TCT-CL). The fifth sampler is the 
antithetic variable version of IMH-TCT-CL, which we call IMH-TCT-CL-A. These proposals 
are described in Section 12.31 

Our study compares the performance of the algorithms in terms of the acceptance rate 
of the Metropolis-Hastings method, the inefficiency factors (IF) of the parameters, and an 
overall measure of effectiveness which compares the times taken by all samplers to obtain the 
same level of accuracy. We define the acceptance rate as the percentage of accepted values of 
each of the Metropolis-Hastings proposals. We define the inefficiency of the sampling scheme 
for a given parameter as the variance of the parameter estimate divided by its variance when 
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the sampling scheme generates independent iterates. We estimate the inefficiency factor as 



IF = 1 + 2Y^^Ktr(^^)pj (7) 



where pj is the estimated auto correlation at la g j and the truncated kernel function Ktr{x 



1 if Ixl ^ 1 and otherwise ( Andrews 



1991 



). As a rule of thumb, the maximum number 



of lags T is given by the lowest index j such that \pj\ < 2/vM with M being the sample 
size use to compute pj. We define the equivalent sample size as ESS = N/IF, where 
N = 10000, which can be interpreted as N iterates of the dependent sampling scheme are 
equivalent to ESS independent iterates. The acceptance rate and the inefficiency factor do 
not take into account the time taken by a sampler. To obtain an overall measure of the 
effectiveness of a sampler, we define its equivalent computing time ECT = N x IF x T, 
where T is the time per iteration of the sampler and = 100000. We interpret ECT as the 
time taken by the sampler to attain the same accuracy as that attained by N independent 
draws of the same sampler. For two samplers a and b, ECTa/ ECTb is the ratio of times 
taken by them to achieve the same accuracy. 

We note that the time per iteration for a given sampling algorithm depends to an impor- 
tant extent on how the algorithm is implemented, e.g. language used, whether operations 
are vectorized, which affects ECT but not the acceptance rates nor the inefficiencies. 
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3.1 Logistic Regression 

This section applies the adaptive sampUng schemes to the binary logistic regression model 



exp (x'-P) 
1 + exp ' 



using three different priors for the vector of coefficients /3. The first is a non-informative 
multivariate normal prior, 

(3 ~ iV(0, 10^/) . (9) 

The second is a normal prior for the intercept /3o ~ -^(0, 10^), and a double exponential, or 
Laplace prior, for all the other coefficients, 

ft~^-p(-^). (10) 

The regression coefficients are assumed to be independent a priori. We note that this is the 



prior implicit in the lasso (jTibshirani 



19961 1. The prior for r is IG(0.01, 0.01), where IG(a, b) 
means an inverse gamma density with shape a and scale b. The double exponential prior 
has a spike at zero and heavier tails than the normal prior. Compared to their posterior 
distributions under a diffuse normal prior, this prior shrinks the posterior distribution of 
the coefficients close to zero to values even closer to zero, while the coefficients far from zero 
are almost unmodified. In the adaptive sampling schemes we work with log(r) rather than 
r as it is unconstrained. 

The third prior distribution takes the prior for the intercept as /3o ~ -^(0, 10^) and the 

13 



prior for the coefficients /3j,j > 1, as the two component mixture of normals, 



~ ioMP^o, t|) + (1 - uj)M(^j;0, tD 



(11) 



George and McCulloch 



(|l993l ) 



with the regression coefficients assumed a priori independent, 
suggest using this prior for Bayesian variable selection, with r| and small and large 
variances that are chosen by the user. In our article their values are given for each of the 
examples below. The prior for oo is uniform. In the adaptive sampling we work with the 
logit of oj because it is unconstrained. 



Labor force participation by women 



This section models the probability of labor force participation by women, i nlf , in 1975 



as a f unction of the covariates listed in Table [TJ This data set is discussed by 



Wooldridge 



(|2004l ) ■ p. 537 and has a sample size of 753. 



Table 1: Variables used in labor force participation data regression 
inlf : 1 if the woman is in labor force in 1975 and otherwise; 

kidsltQ : 1 if kids are with less than 6 years and otherwise; 
kidsgeQ : 1 if kids are between 6 and 18 years and otherwise; 
age : Age of the woman; 

educ : Years of schooling; 

hushrs : Hours worked by husband in 1975; 
huswage : Husband's hourly wage in 1975; 
mtr : Federal marginal tax rate facing woman; 

exper : Actual labor market experience; and 

nwifeinc : family income {famine) in 1975 minus wage times hour divided by 1000 
{{famine — wage x hours)/1000). 



We ran all the adaptive sampling schemes presented in Section [2] for all three prior distri- 
butions. Our targets are 12 dimensional (normal diffuse prior) and 13 dimensional (double 

14 



exponential prior and mixture of normals prior) posterior distributions. Our starting values 
and initial proposal distributions for all the adaptive algorithms were obtained by fitting 
a generalized linear model using the function gimfit in MATLAB. We used r| = 0.01 and 
= 10000 for the mixture of normals prior. Table [2] summarizes the posterior distributions 
for the three priors. 

Table 2: Summary statistics for the logistic regression applied to the labor force participa- 
tion data under normal, double exponential (with 9 = log(T)) and mixture of normals (with 
9 = logit(a;)) prior distributions. 



Parameter 


Normal 


Double Exponential 


Mixture of Normals 




Mean 


S. 


Dev. 


Mean 


S. 


Dev. 


Mean 


S. 


Dev. 


9 








0.6107 





3973 


1.3208 





7118 


intercept 


22.4612 


3 


1836 


16.5521 


3 


7459 


23.6789 


2 


8201 


kidsltG 


-1.0685 





2200 


-1.0885 





2157 


-1.1618 





2168 


kidsgeG 


0.3347 





0862 


0.2864 





0884 


0.1763 





0649 


age 


-0.0688 





0164 


-0.0703 





0158 


-0.0790 





0151 


educ 


0.1521 





0492 


0.1612 





0480 


0.1225 





0424 


hushrs 


-0.0010 





0002 


-0.0009 





0002 


-0.0008 





0002 


huswage 


-0.2587 





0522 


-0.2220 





0530 


-0.1851 





0436 


mtr 


-23.2281 


3 


5870 


-16.2031 


4 


3158 


-25.1363 


3 


0601 


exper 


0.7621 





1584 


0.8448 





1648 


0.1944 





0584 


nwifeinc 


-0.1355 





0241 


-0.1074 





0254 


-0.1211 





0212 


exper"^ 


-0.0030 





0012 


-0.0029 





0012 


-0.0026 





0011 


mtr X exper 


-0.8276 





2219 


-0.9472 





2312 


-0.0419 





0781 



Table [3] shows the acceptance rates, inefficiencies, the equivalent sample sizes and the 
equivalent computing times of the adaptive sampling schemes for the three prior distribu- 
tions. 

Although the acceptance rates for the three component adaptive random walk are higher 
than for the two component adaptive random walk, the results are more ambiguous when 
comparing their inefficiencies. The inefficiencies for all the adaptive independent Metropolis- 
Hastings schemes are much lower than the two adaptive random walk Metropolis schemes. 
The copula based proposal distribution performed the best in terms of the acceptance rates 

15 



Table 3: Acceptance rates, inefficiencies (IF) based on tlie truncated kernel, equivalent 
sample size (ESS=10000/IF) and ECT = IF x time for 100 000 iterations for the logistic 
regression model applied to the labor force participation data. 







Inefficiency 


Equivalent sample size 


ECT 


Algorithm 


A. Rate 


Min 


Median 


Max 


Min 


Median 


Max 


Min 


Median 


Max 




Normal Prior 


RWM 


16.9 


63.149 


66.419 


69.542 


144 


151 


158 


1696 


1784 


1868 


RWM3C 


30.4 


45.098 


46.744 


48.685 


205 


214 


222 


1219 


1264 


1316 


IMH-MN-CL 


67.1 


1.981 


2.188 


2.426 


4121 


4571 


5049 


191 


211 


234 


IMH-TCT-CL 


76.5 


1.695 


1.761 


1.918 


5213 


5680 


5900 


161 


167 


182 


IMH-TCT-CL-A 


79.2 


0.749 


0.836 


0.938 


10655 


11959 


13349 


202 


226 


253 




Double exponential Prior 


RWM 


16.7 


68.437 


71.198 


75.024 


133 


140 


146 


1907 


1984 


2090 


RWM3C 


30.1 


47.749 


50.993 


52.587 


190 


196 


209 


1340 


1431 


1476 


IMH-MN-CL 


66.6 


2.080 


2.185 


2.415 


4140 


4578 


4807 


258 


271 


299 


IMH-TCT-CL 


76.2 


1.753 


1.874 


2.614 


3826 


5336 


5706 


277 


296 


413 


IMH-TCT-CL-A 


77.7 


1.007 


1.182 


1.896 


5273 


8460 


9935 


263 


308 


495 




Mixture of Normals Prior 


RWM 


25.9 


36.243 


44.711 


46.234 


216 


224 


276 


1230 


1517 


1568 


RWM3C 


29.7 


49.411 


52.163 


79.618 


126 


192 


202 


1681 


1774 


2708 


IMH-MN-CL 


68.6 


1.986 


2.126 


2.642 


3786 


4703 


5036 


204 


218 


271 


IMH-TCT-CL 


76.7 


1.643 


1.776 


2.030 


4926 


5630 


6085 


352 


381 


435 


IMH-TCT-CL-A 


81.8 


0.661 


0.733 


1.136 


8800 


13651 


15120 


164 


182 


282 



and the inefficiency factors, and its antithetic version performed even better. 



Home mortgage disclosure act 



The home mortgage disclosure act (HMDA) dat a set relates to mor t gage 



Boston in 1990. It is discussed i n Section 11.4 of 
analyzed by many authors, e.g. iMegbolugbe and Card ( 



Stock and Watson 



applications in 



(j2007l ) and has been 



19931 ) ■ The dependent variable is 



deny, which is 1 if a mortgage application is denied and otherwise. The sample size is 
2380 with the covariates listed in Table [H 

Our starting values and initial proposal distributions for all the adaptive algorithms 
were obtained by fitting a generalized linear model using the function gimfit in MATLAB. 
We used rj = 0.01 and t£ = 10000 for the mixture of normals prior. Table [5] summarizes 
the estimation results and Figure [2] shows the two marginals that are bimodal under the 
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Table 4: Covariates used in the HMDA regression 



pirat : Scaled total obligation to income ratio; 
black : 1 if African-American and otherwise; 
cored : Credit history of consumer payments: 

1 if no slow pay, 

2 if one or two slow pay accounts, 

3 if more than two slow pay accounts, and 

4 if insufficient credit history; 
mcred : Mortgage history: 

1 if no late mortgage payments, 

2 if no mortgage payment history, 

3 if one or more late mortgage payments, and 

4 if More than two late mortgage payments; 
pubrec : 1 if there is a bankruptcy public record and otherwise; 
denpmi : 1 if a private mortgage insurance has been denied and otherwise; 
selfemp : 1 if self employed and otherwise; 

married : 1 if married and otherwise; 

hischl : 1 if number of years of schooling ^ 12 and otherwise; 
Ivtmed : (loan-to- value ^ 80%) x (loan-to- value ^ 95%); 
Itvhigh : loan-to- value > 95%; 

ccredS : 1 if credit history of consumer payments is equal to 3 and otherwise; 
ccred4: : 1 if credit history of consumer payments is equal to 4 and otherwise; 
ccred5 : 1 if credit history of consumer payments is equal to 5 and otherwise; and 
ccredQ : 1 if credit history of consumer payments is equal to 6 and otherwise. 



mixture of normals prior. 

Table [6] shows the acceptance rates, inefficiencies, the equivalent sample sizes and the 
equivalent computing time of the adaptive sampling schemes for the three prior distribu- 
tions. Both adaptive random walk Metropolis algorithms have much higher inefficiencies 
than the adaptive independent Metropolis-Hastings algorithms, especially under the mix- 
ture of normals prior distribution. The copula based proposal has the highest acceptance 
rates and lowest inefficiencies, especially for the mixture of normals prior. The reason for 
this may be that this prior produces a multimodal posterior distribution. 

We remark that we monitored the updates of the proposal distributions through all 
iterates of all algorithms using the t copula based sampler and noted that most of the time 
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Table 5: Summary statistics for the logistic regression applied to the home mortgage disclo- 
sure act data under normal, double exponential (with 6 = log(r)) and mixture of normals 
(with 6 = logit(a;)) prior distributions. 



Parameter 


Normals 


Double Exponential 


Mixture of Normals 


Mean 


S. 


Dev. 


Mean 


S. 


Dev. 


Mean 


s. 


Dev. 


e 


_ 




_ 


0.1637 





2890 


0.8536 





5573 


intercept 


-4.9153 





6744 


-4.4802 





6342 


-5.0159 





3489 


pirat 


4.8068 





7904 


4.2074 





7831 


4.8814 





7589 


black 


0.6042 





1797 


0.5955 





1766 


0.4081 





3190 


cored 


0.7326 





2134 


0.4498 





1464 


0.2718 





0403 


mcred 


0.2215 





1456 


0.2125 





1428 


0.1079 





0797 


pubrec 


1.2814 





2132 


1.2360 





2118 


1.3572 





1976 


denpmi 


4.7761 





5888 


4.4736 





5510 


4.6734 





5538 


selfemp 


0.6645 





2160 


0.6388 





2143 


0.1012 





0912 


married 


-0.3971 





1544 


-0.3799 





1520 


-0.1290 





0831 


hischl 


-1.1721 





4276 


-0.9786 





4315 


-0.0507 





0968 


Ivtmed 


0.4933 





1616 


0.4686 





1614 


0.1451 





0906 


Itvhigh 


1.5686 





3193 


1.4515 





3235 


1.2000 





5342 


ccredS 


-0.6192 





4683 


-0.1229 





3454 


0.0157 





0917 


ccredA 


-0.6872 





6469 


0.0544 





4433 


0.0556 





0954 


ccredb 


-1.7431 





8103 


-0.6463 





5597 


-0.0086 





0906 


ccredG 


-2.1088 


1 


0084 


-0.7534 





6892 


0.0003 





0923 



Table 6: Acceptance rates, inefficiencies (IF) based on the truncated kernel, equivalent 
sample size (ESS=10000/IF) and ECT = IF x time for 100 000 iterations for the logistic 
regression model applied to the home mortgage disclosure act data. 







Inefficiency 


Equivalent sampl 


B size 


ECT 


Algorithm 


A. Rate 


Min 


Median 


Max 


Min 


Median 


Max 


Min 


Median 


Max 




Normal Prior 


RWM 


28.5 


49.974 


52.714 


54.216 


184 


190 


200 


2872 


3030 


3116 


RWM3C 


30.0 


59.084 


61.101 


63.796 


157 


164 


169 


3405 


3521 


3676 


IMH-MN-CL 


69.2 


1.974 


2.040 


2.211 


4522 


4903 


5067 


302 


312 


338 


IMH-TCT-CL 


76.8 


1.612 


1.821 


2.136 


4682 


5490 


6205 


332 


375 


439 


IMH-TCT-CL-A 


63.9 


1.762 


2.503 


4.095 


2442 


3995 


5676 


519 


737 


1206 










Double 


exponential Prior 










RWM 


27.5 


53.770 


57.625 


60.116 


166 


174 


186 


3153 


3379 


3525 


RWM3C 


29.5 


61.348 


63.907 


68.001 


147 


156 


163 


3609 


3760 


4001 


IMH-MN-CL 


63.8 


2.244 


2.401 


2.968 


3369 


4164 


4457 


195 


208 


257 


IMH-TCT-CL 


65.0 


2.266 


2.468 


2.743 


3645 


4052 


4412 


712 


775 


862 


IMH-TCT-CL-A 


65.1 


1.665 


1.865 


2.626 


3807 


5361 


6006 


627 


702 


989 




Mixture of Normals Prior 


RWM 


15.8 


66.962 


96.965 


159.411 


63 


103 


149 


4309 


6240 


10258 


RWM3C 


20.1 


91.179 


104.422 


159.808 


63 


96 


110 


5886 


6741 


10316 


IMH-MN-CL 


22.4 


15.066 


26.898 


109.568 


91 


372 


664 


1108 


1978 


8057 


IMH-TCT-CL 


59.0 


3.041 


3.293 


13.016 


768 


3037 


3288 


1520 


1647 


6508 


IMH-TCT-CL-A 


55.7 


3.096 


3.476 


13.193 


758 


2877 


3230 


1934 


2172 


8241 
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(a) Marginal posterior distribution for fitiack 



1 
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(b) Marginal posterior distribution for Puvhigh 



Figure 2: Two marginal posterior distributions of the logistic regression model applied to 
the home mortgage disclosure act data under mixture of normal prior with t| = 0.01 and 
r| = 10000. 

the degrees of freedom was generated as 1 000, effectively giving a Gaussian copula. 



3.2 Bayesian Quantile Regression 



In classical quantile regression the vector of /3 coefficients is estimated by minimizing the 
function 



n 
i=l 
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for (3 for a given < 5 < 1, where ps{u) = {\u\ + {25 — l)u)/2. We can turn this into a 
Ukehhood function based on the asymmetric Laplace distribution, i.e. 



PS 



(y 1/3, a, x) = n ^(1 - 5)a-^ exp | -ps ( 
i=i ^ ^ 



a 



(12) 



See for example 



Yu and Moveedl (|200ll ) and references therein for a discussion of Bayesian 
quantile regression. In the example below, we use the same three priors for the regression 
parameters as in Section [3. H an IG(0.01, 0.01) prior for a and take r| = 0.0001 and = 1 
for the mixture of normals prior distribution. 



Current population survey 

We consider a data set of full-time male workers given by the U.S. current population 
survey of March 2000. The dependent variable is the logarithm of wage and salary income 
and the covariates a re listed in Table [71 The saniple s ize is 5149 observations. Similar 



data are analyzed by 



Donald. Green and Paarsch 



(|2000l ). We carry out Bayesian quantile 



re gression for this da. ta set for 6 = 0.1. Similar analyses for 5 = 0.5 and 5 = 0.9 are reported 



m 



Silva et al. 



(j2008l ). The target distributions in this example are 24 and 25 dimensional. 
Our initial proposal distributions for the adaptive independent samplers were obtained by 
first running the 3 component adaptive random walk for 20 000 iterations for the normal 
and double exponential priors and 100 000 iterations for the mixture of normals prior. In all 
cases we initialized the random walks using linear regression estimates. Table [8] summarizes 
the marginal posterior distributions of the parameter estimates. 

Table shows the acceptance rates, inefficiencies, the equivalent sample sizes and the 
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Table 7: Covariates for the quantile regressions of log income in CPS data 



age 
age^ 
Education: 
edhsinc 
edpssome 
eddegree 

Coverage: 
duncov 

Employment: 
dpub 

Occupation: 
occwhtp 
occwhto 

Industry: 
indprim 

indtrans 
indtrade 
indserv 
indpuba 

Location: 
smsam 

Region: 
regmw 

regs 
regw 

Marital: 

marnev 
marsdw 

Dependents: 
freluQ 
frelulS 

Race: 
vm 



Age of worker in years; 
Square of age 

1 if incomplete high school and otherwise; 

1 if some post-secondary and otherwise; 

1 if completed college or university and otherwise; 

(Omitted category is completed high school.) 

1 if covered by a union negotiated collective agreement and otherwise; 

1 if works in public sector and otherwise; 

white collar professional (e.g. managers) 
while collar other (e.g. clerical) 
(Omitted category is blue collar.) 

1 if primary and otherwise; 

1 if transport and commn and otherwise; 

1 if trade and otherwise; 

1 if services and otherwise; 

1 if public admin and otherwise; 

(Omitted category is manufacturing sector.) 

1 if location in metropolitan area and otherwise; 

1 if mid- west and otherwise; 
1 if south and otherwise; 
1 if west and otherwise; 
(Omitted category is north east.) 

1 if never married and otherwise; 

1 if separated, divorced or widowed and otherwise; 

(Omitted category is married.) 

1 if kids are with less than 6 years and otherwise; 

1 if kids are between 6 and 18 years and otherwise; and 

1 if visible minority and otherwise. 



equivalent computing times of the adaptive sampling schemes for the three prior distribu- 
tions. The adaptive independent Metropolis-Hastings algorithms have lower inefficiencies 
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Table 8: Summary statistics for the 0.1-quantile regression applied to the current population 
survey data under normal, double exponential (with 9 = log(r)) and mixture of normals 
(with 9 = logit(a;)) prior distributions. 



Parameter 


Normals 


Double Exponential 


Mixture of Normals 


Mean 


S 


. Dev. 


Mean 


S 


. Dev. 


Mean 


S 


. Dev. 


9 


_ 




_ 


-5.43902 





21692 


3.70487 


1 


28372 


a 


0.01032 





00014 


0.01032 





00014 


0.01032 





00014 


intercept 


0.02686 





00825 


0.03108 





00806 


0.03299 





00745 


age 


0.02286 





00420 


0.02056 





00412 


0.01956 





00377 


age^ 


-0.00205 





00052 


-0.00176 





00051 


-0.00164 





00047 


edhsinc 


-0.01448 





00196 


-0.01437 





00188 


-0.01435 





00186 


edpssome 


0.00599 





00161 


0.00571 





00158 


0.00582 





00155 


eddegree 


0.01696 





00191 


0.01657 





00190 


0.01675 





00185 


duncov 


0.01478 





00199 


0.01420 





00207 


0.01424 





00197 


dpub 


-0.00301 





00291 


-0.00168 





00259 


-0.00208 





00270 


occwhtp 


0.02694 





00204 


0.02640 





00198 


0.02586 





00196 


occwhto 


0.00268 





00166 


0.00215 





00158 


0.00200 





00158 


indprim 


-0.00835 





00206 


-0.00764 





00199 


-0.00767 





00198 


indtrans 


-0.00216 





00229 


-0.00138 





00211 


-0.00135 





00222 


indtrade 


-0.01902 





00205 


-0.01803 





00198 


-0.01780 





00195 


indserv 


-0.01753 





00199 


-0.01677 





00192 


-0.01647 





00188 


indpuba 


0.00711 





00420 


0.00631 





00385 


0.00715 





00382 


smsam 


0.00767 





00134 


0.00734 





00134 


0.00748 





00132 


regmw 


0.00442 





00194 


0.00387 





00188 


0.00412 





00191 


regs 


-0.00102 





00167 


-0.00126 





00157 


-0.00119 





00163 


regw 


0.00270 





00197 


0.00208 





00179 


0.00226 





00187 


marnev 


-0.00496 





00168 


-0.00475 





00167 


-0.00508 





00165 


marsdw 


-0.00733 





00216 


-0.00661 





00217 


-0.00687 





00212 


freluG 


-0.00234 





00124 


-0.00207 





00117 


-0.00233 





00120 


frelulS 


0.00157 





00069 


0.00154 





00067 


0.00163 





00067 


nil 


-0.00865 





00218 


-0.00821 





00211 


-0.008-12 





00208 



than the adaptive random walk Metropolis algorithms, and the copula based proposals have 
in general the highest acceptance rates and lowest inefficiencies. 



4 Binary random effects model 

This section considers adaptive sampling in a binary random effects model where the random 
effects are integrated out using importance sampling. However, the same ideas can be 
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Table 9: Acceptance rates, inefficiencies (IF) based on the truncated kernel, equivalent 
sample size (ESS=10000/IF) and ECT = IF x time for 100 000 iterations for the 0.1- 
quantile regression applied to the current population survey data. 







Inefficiency 


Equivalent sampl 


e size 


Timing 


Algorithm 


A. Rate 


Min 


Median 


Max 


Min 


Median 


Max 


Min 


Median 


Max 




Normal Prior 


RWM 


18.0 


93.934 


99.926 


102.211 


98 


100 


106 


2757 


2933 


3000 


RWM3C 


25.2 


88.069 


90.182 


92.733 


108 


111 


114 


2598 


2661 


2736 


IMH-MN-CL 


47.3 


3.855 


4.321 


4.957 


2017 


2315 


2594 


128 


143 


164 


IMH-TCT-CL 


54.2 


3.353 


3.609 


4.049 


2470 


2770 


2982 


849 


914 


1025 


IMH-TCT-CL-A 


51.7 


3.125 


3.642 


4.376 


2285 


2746 


3200 


889 


1036 


1245 










Double 


exponential Prior 










RWM 


15.5 


102.917 


106.161 


108.036 


93 


94 


97 


3551 


3662 


3727 


RWM3C 


22.4 


93.033 


95.986 


98.044 


102 


104 


107 


3227 


3329 


3401 


IMH-MN-CL 


41.6 


4.971 


5.452 


16.263 


615 


1834 


2012 


188 


206 


614 


IMH-TCT-CL 


47.8 


4.273 


4.568 


4.993 


2003 


2189 


2340 


1452 


1552 


1697 


IMH-TCT-CL-A 


46.3 


4.166 


4.623 


5.859 


1707 


2163 


2401 


1732 


1923 


2437 




Mixture of Normals Prior 


RWM 


22.8 


81.008 


83.060 


92.968 


108 


120 


123 


3006 


3083 


3450 


RWM3C 


23.2 


84.279 


86.135 


92.548 


108 


116 


119 


3157 


3226 


3466 


IMH-MN-CL 


46.0 


4.227 


4.688 


8.217 


1217 


2133 


2366 


205 


227 


398 


IMH-TCT-CL 


54.4 


3.507 


3.722 


6.412 


1560 


2687 


2852 


2066 


2193 


3777 


IMH-TCT-CL-A 


49.7 


3.938 


4.365 


9.977 


1002 


2291 


2540 


4939 


5475 


12514 



applied to other random effects models. 

Suppose there are N groups with Jj observations in the iih group, such that the prob- 
ability of the ijth binary response is given by the probit model. 



FT{yij = l\lJ,i,P,Xij) = ^{lJ,i + xJjp), Hi ~iV(0,(7^) i = l,...,N,j = l,...,Ji , 

(13) 

and $(•) is the standard normal cumulative distribution function. Let = {I3,a^) be the 
parameter vector, yj = {yn, . . . , yiJ^), Xj = {xn, . . . , x^j.) are the vectors of observations on 
the iih. group, y is the vector of all the observations and x is the vector of all the covariates. 
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Then 



p{y \0,x) = Y\p{yi I , p{yi I 0,Xi) = / p{yi \ fj,i,0,Xi)p{ni \ 0)dni (14) 



i=l 



We form proposals for the posterior p(6 \ y, x) with the random effects integrated out 
because in many apphcations there are too many random effects to include in the adaptation. 
We use importance sampling to integrate out the random effects with the importance density 
based on previous iterates. Let E{fj,i \ y) and viir(;Uj | y) be estimates of the posterior mean 
and variance of /ij. We use the importance density h{^i) ~ N^E{fii \ y),Kvar(^j | y)^ to 
efficiently integrate out in p^ . with k = 4, using 




(15) 



where the /i^^ are generated from h{^i). We obtain E{^i \ y) by first writing E{^i \ y) 



E{E{ni I yi,0) I y) , so that 




(16) 



1=1 



E{fi, 




piYi I OV\xi) 



1 




^ip(yi I 0",xi)p(^i I 0W)//i(^i) ]h{|JLi)d^l^ 




M 



(17) 



where the 0''' are iterates from the adaptive sampling and p(yj | 0H,Xj) is constructed by 



the right side of (fT5]) . The estimate -E(/if | y) is obtained similarly and var(/Xi | y) is then 
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2 

,2 



computed as E{fl^ \ y) — yE{fj,i \ y) 

We update the importance density after every L accepted values of the adaptive sampling 
scheme, with L given in appendix [Al 



Pap smear data 

We applied the probit random effects model ([13]) to data collected in a discrete choice 
experiment designed to study factors that may determine whether a wo man chooses to 



have a Pap smear test to detect cervical cancer. The study is described bv iFiebig and Hall 



(|2005l ) and is based on = 79 Australian women, where each woman was presented with 
Jj = 32 different scenarios and for each scenario asked whether she would choose to have a 
Pap smear test. Thus, there are 32 repeated binary observations for each woman. Table [TOl 
lists the covariates in the study. 

Table 10: Covariates for the Pap smear data set 
knowgp : 1 if the GP is known to the patient and otherwise; 
sexgp : 1 if the GP is male and otherwise; 

testdue : 1 if the patient is due or overdue for a Pap test and otherwise; and 
drrec : 1 if GP recommends that the patient has a Pap test and otherwise. 
papcost : Cost of the Pap test in AU$ (2 levels). 



We fitted the binary random effects model with 7 parameters and 79 random effects to 
the data with an IG(0.01, 0.01) prior for a^. For the double exponential prior for /3, the 
prior for r is IG(0.01, 0.01). For the mixture of normals prior, we set rj = 0.0001 and 
t1 = 10000. and the prior for uj is uniform on (0,1). In the adaptive sampling scheme we 
generated log o"^ because it was unconstrained. The initial values and proposal distributions 
for the adaptive independent Metropolis-Hastings algorithms were obtained by running the 3 
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component adaptive random walk for 2 000 iterations. To initialize all the adaptive random 
walk algorithms we first used the MATLAB function glmfit to estimate the regression 
coefficients and their standard errors with the random effects set identically to zero. To 
integrate out the random effects in the adaptive random walk proposals we began with the 
proposal importance density h{fii) ~ N(0, 1.5a^), with cr^ = 1 initially. 
Table \TT\ summarizes the posterior distributions of the parameters. 

Table 11: Summary of statistics of the posterior distribution of the probit random effects 
model for the Pap test data under the normal, double exponential (with 9 = log(r)) and 
mixture of normals (with 9 = logit(a;) ) prior distributions. 



Parameter 


Normals 


Double Exponential 


Mixture of Normals 




Mean 


S. Dev. 


Mean 


S. Dev. 


Mean 


S. Dev. 


9 






-0.5683 


0.4488 


-1.0691 


0.8988 




0.5792 


0.2081 


0.5598 


0.2151 


0.5680 


0.1976 


constant 


0.2914 


0.1797 


0.2646 


0.1865 


0.2808 


0.1833 


knowgp 


-0.3052 


0.0633 


-0.2882 


0.0658 


-0.2846 


0.0934 


sexgp 


0.6582 


0.0656 


0.6483 


0.0677 


0.6591 


0.0690 


testdue 


-1.1794 


0.0750 


-1.1688 


0.0807 


-1.1769 


0.0773 


drrec 


-0.4978 


0.0737 


-0.4772 


0.0739 


-0.4941 


0.0725 


papcost 


0.0091 


0.0028 


0.0091 


0.0030 


0.0085 


0.0028 



Table [12] shows the acceptance rates, inefficiencies, equivalent sample sizes, and equiv- 
alent computing times for each algorithm. The copula based sampling schemes have the 
highest acceptance rates and the smallest inefficiency factors, with the antithetic proposal 
being the best for the normal and double exponential priors, where the acceptance rates 
are at least 70%. 



5 Summary 



Our article proposes a new copula based adaptive sampling scheme and a generalization of 
the two component adaptive random walk designed to explore the target space more effi- 
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Table 12: Acceptance rates, inefficiencies (IF) based on the truncated kernel, equivalent 
sample size (ESS=10000/IF) and ECT = IF x time for 100 000 iterations for the probit 
random effects applied to the Pap test data. 







Inefficiency 


Equivalent samp] 


e size 


ECT 


Algorithm 


A. Rate 


Min 


Median 


Max 


Min 


Median 


Max 


Min 


Median 


Max 




Normal Prior 


RWM 


28.6 


18.210 


23.039 


32.306 


310 


434 


549 


618517 


782509 


1097268 


RWM3C 


28.8 


22.623 


24.766 


33.667 


297 


404 


442 


750806 


821899 


1117299 


IMH-MN-CL 


62.1 


2.356 


2.590 


2.999 


3334 


3862 


4244 


77641 


85324 


98824 


IMH-TCT-CL 


71.6 


1.803 


2.086 


2.226 


4493 


4793 


5548 


59834 


69253 


73879 


IMH-TCT-CL-A 


70.9 


0.968 


1.453 


1.893 


5282 


6884 


10329 


32002 


48017 


62580 




Double exponential Prior 


RWM 


26.8 


25.337 


30.855 


39.939 


250 


325 


395 


833790 


1015358 


1314306 


RWM3C 


26.8 


24.894 


30.907 


36.229 


276 


324 


402 


821509 


1019936 


1195573 


IMH-MN-CL 


57.9 


2.656 


2.860 


3.542 


2823 


3502 


3765 


86895 


93583 


115885 


IMH-TCT-CL 


71.8 


1.922 


2.151 


2.514 


3978 


4650 


5202 


63124 


70627 


82551 


IMH-TCT-CL-A 


70.7 


0.974 


1.526 


2.209 


4527 


6555 


10272 


32187 


50462 


73036 




Mixture of Normals Prior 


RWM 


25.4 


27.979 


32.678 


90.248 


111 


307 


357 


1014949 


1185408 


3273811 


RWM3C 


24.2 


26.739 


34.183 


52.536 


190 


293 


374 


1004583 


1284255 


1973767 


IMH-MN-CL 


58.4 


2.369 


3.058 


3.720 


2688 


3271 


4221 


82817 


106888 


130032 


IMH-TCT-CL 


71.5 


2.011 


2.244 


3.364 


2973 


4462 


4971 


73396 


81889 


122746 


IMH-TCT-CL-A 


62.2 


2.365 


2.635 


6.221 


1607 


3794 


4229 


88108 


98204 


231813 



ciently than the proposal of IRoberts and Rosenthall (j2006l ). We studied the performance of 



these sampling schen ies as well as the ad a ptive 



scheme proposed by iGiordani and KohnI (120081 ) which is based on a mixture of normals. 



independent Metropolis-Hastings sampling 



All the sampling schemes performed reliably on the examples studied in the article, but we 
found that the adaptive independent Metropolis-Hastings schemes had inefficiency factors 
that were often much lower and acceptance rates that were much higher than the adaptive 
random walk schemes. The copula based adaptive scheme often had the smallest ineffi- 
ciency factors and highest acceptance rates. For acceptance rates over 70% the antithetic 
version of the copula based approach was the most efficient. Our results suggest that the 
copula based proposal provides an attractive approach to adaptive sampling, especially 



for hi gher dimensions. However, the mixture of normals approach of iGiordani and Kohn 



(|2008l ) also performed well and is useful for complicated and possibly multimodal posterior 
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distributions. 
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A Details of the Simulation 

All the computations were done on Intel Core 2 Quad 2.6 Ghz processors, with 4GB RAM 
(SOOMhz) on a GNU/Linux platform using MATLAB 2007b. However, in the TCT algo- 
rithm we computed the univariate cumulative distribution functions and inverse cumulative 
distribution functions of the t, normal and mixture of normals distributions using MATLAB 
mex files based on the corresponding MATLAB code. In addition, to speed up the compu- 
tation, we tested each marginal for normality using the Jarque-Bera test at the 5% level. If 
normality was not rejected then we fitted a normal density to the marginal. Otherwise, we 
estimated the marginal density by a mixture of normals. 

In Stage 1 of the adaptive sampling schemes IMH-MN-CL and IMH-MN-SA that use 
a multivariate mixture of normals, the number of components (nc) used in the third term 
gsix) of the mixture (jH) is determined by the dimension of the parameter vector (dim) and 
the number of accepted draws (accep) to that stage of the simulation. In particular, nc = 1 
if accep/ dim < 40, nc = 2 if 40 < accep/ dim < 100, nc = 3 if 100 < accep/ dim < 200 and 
nc = 4 if accep /dim > 200. 
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We now give details of the number of iterations, burn-in and updating schedules for 
all the adaptive independent Metropolis-Hastings schemes in the paper. In addition, we 
update the proposal in Stage 1 if in 100 successive iterations the acceptance rate is lower 
than 0.01. 

Logistic regression, HMDA data 

• Normal prior: end of first stage = 5 000; burn-in = 75 000; number of iterations: 
100 000; updates = [50, 100, 150, 200, 300, 500, 700, 1000, 2000, 5000, 10000, 20000, 
30000, 50000, 75000]. 

• Double exponential prior: end of first stage = 5 000; burn-in = 100 000; number of 
iterations: 150 000; updates = [50, 100, 150, 200, 300, 500, 700, 1000, 2000, 5000, 
10000, 20000, 30000, 50000, 75000, 100000]. 

• Mixture of normals prior: end of first stage = 100 000; burn-in = 300 000; number of 
iterations: 400 000; updates = [100, 150, 200, 300, 500, 700, 1000, 2000, 3000, 5000, 
7500, 10000, 15000, 20000, 30000, 50000, 75000, 100000, 125000, 150000, 175000, 
200000, 225000, 250000, 300000]. 

Bayesian quantile regression, CPS data. 

• Normal and double exponential priors, quantiles 0.1, 0.5 and 0.9: end of first stage = 
3 000; burn-in = 150 000; number of iterations: 200 000; updates = [100, 150, 200, 
300, 500, 700, 1000, 2000, 3000, 5000, 7500, 10000, 15000, 20000, 30000, 50000, 75000, 
100000, 150000]. 

• Mixture of normals prior, quantiles 0.1, 0.5 and 0.9: end of first stage = 200 000; 
burn-in = 400 000; number of iterations: 500 000; updates = [100, 150, 200, 300, 500, 
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700, 1000, 2000, 3000, 5000, 7500, 10000, 15000, 20000, 30000, 50000, 75000, 100000, 
125000, 150000, 175000, 200000, 225000, 250000, 275000, 300000, 325000, 350000, 
375000, 400000]. 

Probit random effects model, Pap smear data. For all three priors, end of first stage = 5 000; 
burn-in = 10 000; number of iterations: 20 000; updates = [20, 50, 100, 150, 200, 300, 400, 
500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 2000, 2500, 3000, 3500, 4000, 
4500, 5000, 6000, 7000, 8000, 9000, 10000, 12000, 15000]. 

In this example the importance sampling density is updated every L = 100 iterations. 

We now give the details of the sampling for both adaptive random walk Metropolis 
algorithms. For the HMDA data the number of burn-in iterations was 300 000 and the 
total number of iterations was 500 000 for all three priors. The corresponding numbers for 
the CPS data with normal and double exponential priors are 500 000 and 1000 000, and for 
the mixture of normals prior 1000 000 and 1500 000. The corresponding numbers for the 
Pap smear data are 30 000 and 50 000. 
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